Speech, silence, music and noise classification of TV broadcast material
نویسندگان
چکیده
Speech processing can be of great help for indexing and archiving TV broadcast material. Broadcasting station standards will be soon digital. There will be a huge increase in the use of speech processing techniques for maintaining the archives as well as accessing them. This paper starts with a review of several techniques used for classification of speech, music and noise. Generally, approaches that use Neural Networks (NN) or Hidden Markov Modelling (HMM) do not allow to “look inside” the network or models to determine which aspect of the sounds are similar to each other. This makes it difficult for the researcher to determine the features of the audio that are important and which ones can be ignored [1]. Furthermore, for archiving TV broadcast material, the segment time accuracy does not need to be as precise as when labelling speech corpora to be used for speech recognition research. Here, it is more important to have the correct label than to have the precise start and finish times of each segment. We present an application of information theory to the classification and automatic labelling of TV broadcast material into speech, music and noise. We use information theory to construct a decision tree from several different TV programs. This is known as the training data. We then apply this decision tree to a different set of TV programs, known as test data. We present the classification results on the training and test data sets. The correct classification rate at the frame level, for the training data was 95.5%, while for the test data it ranged from 60.4% to 84.5%, depending on the TV program type. At the segment level, the correct recognition rate and accuracy on the train data were 100% and 95.1%, respectively while for the test data the %correct ranged from 80% to 100% and %accuracy ranged from 64.7% to 100%.
منابع مشابه
A Hierarchical Approach for Audio Stream Segmentation and Classification
This paper describes a hierarchical approach for fast audio stream segmentation and classification. With this approach, the audio stream is firstly segmented into audio clips by MBCR (Multiple sub-Bands spectrum Centroid relative Ratio) based histogram modeling. Then a MGM (Modified Gaussian modeling) based hierarchical classifier is adopted to put the segmented audio clips into six pre-defined...
متن کاملA sound source classification system based on subband processing
A classification system that aims to recognize the presence of sounds from different sources is described. The type of audio signals considered are speech, music, noise and silence. Appropriate subband processing is applied for the characterization of each sound source. The algorithm operates in four steps to classify the contents of a given audio signal. The acoustical parameters and statistic...
متن کاملThe effects of background music on speech recognition accuracy
Recognition of broadcast data, such as TV and radio programs is a topic of great interest. One of the problems with such data is the frequent presence of background music that degrades the performance of speech recognition systems. In this paper we examine the effects of different kinds of music on automatic speech recognition systems by comparing the effects of music with the relatively well-k...
متن کاملAudio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion
Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a previous audio segmentation stage may be useful to improve the robustness of speech technologies like automatic speech recognition and speaker diarization. In this article, we present the eva...
متن کاملEffects of Background Music on Phonological Short-term Memory
Immediate memory for visually presented verbal material is disrupted by concurrent speech, even when the speech is unattended and in a foreign language. Unattended noise does not produce a reliable decrement. These results have been interpreted in terms of a phonological short-term store that excludes non-speechlike sounds. The characteristics of this exclusion process were explored by studying...
متن کامل